Goto

Collaborating Authors

 imbalanced class


Review for NeurIPS paper: Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning

Neural Information Processing Systems

This paper proposes an approach to semi-supervised learning for imbalanced classes. It is indeed non-trivial to combine local/global/perturbation consistency-based semi-supervised methods and fully supervised methods for imbalanced classes---this paper may be the first work along this direction. The paper is quite general and can be applied on top of any pseudo-labeling-based semi-supervised methods. It first estimates the true class-prior probability and then updates/modifies the pseudo labels by pushing their class-prior probability with a constrained convex optimization. While in the beginning the reviewers had some concerns (mainly the clarity and too few datasets), the authors did a particularly good job in their rebuttal (showing that the class-prior probability can be estimated rather than must be given).


Locality-preserving Directions for Interpreting the Latent Space of Satellite Image GANs

arXiv.org Artificial Intelligence

We present a locality-aware method for interpreting the latent space of wavelet-based Generative Adversarial Networks (GANs), that can well capture the large spatial and spectral variability that is characteristic to satellite imagery. By focusing on preserving locality, the proposed method is able to decompose the weight-space of pre-trained GANs and recover interpretable directions that correspond to high-level semantic concepts (such as urbanization, structure density, flora presence) - that can subsequently be used for guided synthesis of satellite imagery. In contrast to typically used approaches that focus on capturing the variability of the weight-space in a reduced dimensionality space (i.e., based on Principal Component Analysis, PCA), we show that preserving locality leads to vectors with different angles, that are more robust to artifacts and can better preserve class information. Via a set of quantitative and qualitative examples, we further show that the proposed approach can outperform both baseline geometric augmentations, as well as global, PCA-based approaches for data synthesis in the context of data augmentation for satellite scene classification.


How to Handle Imbalanced Classes in Machine Learning

#artificialintelligence

Imbalanced classes put "accuracy" out of business. This is a surprisingly common problem in machine learning (specifically in classification), occurring in datasets with a disproportionate ratio of observations in each class. Standard accuracy no longer reliably measures performance, which makes model training much trickier. Up-sampling minority class refers to the technique of oversampling the under-represented class in a binary classification problem to balance the class distribution. The idea behind up-sampling is to randomly duplicate examples from the minority class to increase its representation in the dataset and make the class distribution more balanced.


Machine Learning Performance Analysis to Predict Stroke Based on Imbalanced Medical Dataset

arXiv.org Artificial Intelligence

Cerebral stroke, the second most substantial cause of death universally, has been a primary public health concern over the last few years. With the help of machine learning techniques, early detection of various stroke alerts is accessible, which can efficiently prevent or diminish the stroke. Medical datasets, however, are frequently unbalanced in their class label, with a tendency to poorly predict minority classes. In this paper, the potential risk factors for stroke are investigated. Moreover, four distinctive approaches are applied to improve the classification of the minority class in the imbalanced stroke dataset, which are the ensemble weight voting classifier, the Synthetic Minority Over-sampling Technique (SMOTE), Principal Component Analysis with K-Means Clustering (PCA-Kmeans), Focal Loss with the Deep Neural Network (DNN) and compare their performance. Through the analysis results, SMOTE and PCA-Kmeans with DNN-Focal Loss work best for the limited size of a large severe imbalanced dataset (e.g., Stroke dataset), which is 2-4 times outperform Kaggle's work.


How to evaluate a machine learning model - part 4- Edvancer Eduventures

#artificialintelligence

This blog post is the continuation of my previous articles part 1, part 2 and part 3. Caution: The Difference Between Training Metrics and Evaluation Metrics Sometimes, the model training procedure uses a different metric (also known as a loss function) than the evaluation. This can happen in the instance when we are re-appropriating a model for a different task than it was designed for. For example, we might train a personalized recommender by minimizing the loss between its predictions and observed ratings, and then use this recommender to produce a ranked list of recommendations. This is not an optimal scenario. It makes the life of the model difficult by asking it to do a task that it was not trained to do.


Classification with Imbalanced Data

#artificialintelligence

Building classification models on data that has largely imbalanced classes can be difficult. Using techniques such as oversampling, undersampling, resampling combinations, and custom filtering can improve accuracy. In this article, I'll walk through a few different approaches to deal with data imbalance in classification tasks. To demonstrate various class imbalance techniques, a fictitious dataset of credit card defaults will be used. In our scenario, we are trying to build an explainable classifier that takes two inputs (age and card balance) and predicts whether someone will miss an upcoming payment.


Classification with Imbalanced Data

#artificialintelligence

Building classification models on data that has largely imbalanced classes can be difficult. Using techniques such as oversampling, undersampling, resampling combinations, and custom filtering can improve accuracy. In this article, I'll walk through a few different approaches to deal with data imbalance in classification tasks. To demonstrate various class imbalance techniques, a fictitious dataset of credit card defaults will be used. In our scenario, we are trying to build an explainable classifier that takes two inputs (age and card balance) and predicts whether someone will miss an upcoming payment.


Deep Metric Learning Model for Imbalanced Fault Diagnosis

arXiv.org Artificial Intelligence

Intelligent diagnosis method based on data-driven and deep learning is an attractive and meaningful field in recent years. However, in practical application scenarios, the imbalance of time-series fault is an urgent problem to be solved. This paper proposes a novel deep metric learning model, where imbalanced fault data and a quadruplet data pair design manner are considered. Based on such data pair, a quadruplet loss function which takes into account the inter-class distance and the intra-class data distribution are proposed. This quadruplet loss pays special attention to imbalanced sample pair. The reasonable combination of quadruplet loss and softmax loss function can reduce the impact of imbalance. Experiment results on two open-source datasets show that the proposed method can effectively and robustly improve the performance of imbalanced fault diagnosis.


What is Data Imbalance in Machine Learning?

#artificialintelligence

A software platform for organizations and developers to responsibly deploy, monitor, and get value from AI - at scale. Data imbalance, or imbalanced classes, is a common problem in machine learning classification where the training dataset contains a disproportionate ratio of samples in each class. Examples of real-world scenarios that suffer from class imbalance include threat detection, medical diagnosis, and spam filtering. Class imbalance can make training efficient machine learning models difficult, especially when there aren't enough samples belonging to the class of interest. In the case of fraud detection, the amount of fraudulent transactions is negligible to the number of lawful transactions, making it difficult to train a machine learning model because the training dataset does not contain enough information about fraud.


A walk through imbalanced classes in machine learning through a visual cheat sheet

#artificialintelligence

There are many detailed articles explaining the problem of imbalanced training samples and how to cope up with it. In this article, I summarize the understanding of the problem into a visual cheat sheet. I often find it useful as it comes handy for me whenever I have to revert back to the basic definitions (or I have an interview lined up). The cheat sheet below starts with the background on why accuracy doesn't always give a correct insight related to your classification algorithm and then moves on to defining other meaningful performance metrics. The cheat sheet then provides an example showing how to calculate those metrics for a three-class classification problem.